Salience Estimation via Variational Auto-Encoders for Multi-Document Summarization
نویسندگان
چکیده
We propose a new unsupervised sentence salience framework for Multi-Document Summarization (MDS), which can be divided into two components: latent semantic modeling and salience estimation. For latent semantic modeling, a neural generative model called Variational Auto-Encoders (VAEs) is employed to describe the observed sentences and the corresponding latent semantic representations. Neural variational inference is used for the posterior inference of the latent variables. For salience estimation, we propose an unsupervised data reconstruction framework, which jointly considers the reconstruction for latent semantic space and observed term vector space. Therefore, we can capture the salience of sentences from these two different and complementary vector spaces. Thereafter, the VAEs-based latent semantic model is integrated into the sentence salience estimation component in a unified fashion, and the whole framework can be trained jointly by back-propagation via multi-task learning. Experimental results on the benchmark datasets DUC and TAC show that our framework achieves better performance than the state-of-the-art models.
منابع مشابه
Multi-Document Summarization using Sentence-based Topic Models
Most of the existing multi-document summarization methods decompose the documents into sentences and work directly in the sentence space using a term-sentence matrix. However, the knowledge on the document side, i.e. the topics embedded in the documents, can help the context understanding and guide the sentence selection in the summarization procedure. In this paper, we propose a new Bayesian s...
متن کاملReader-Aware Multi-Document Summarization: An Enhanced Model and The First Dataset
We investigate the problem of readeraware multi-document summarization (RA-MDS) and introduce a new dataset for this problem. To tackle RA-MDS, we extend a variational auto-encodes (VAEs) based MDS framework by jointly considering news documents and reader comments. To conduct evaluation for summarization performance, we prepare a new dataset. We describe the methods for data collection, aspect...
متن کاملQuery-Focused Multi-Document Summarization Using Co-Training Based Semi-Supervised Learning
This paper presents a novel approach to query-focused multi-document summarization. As a good biased summary is expected to keep a balance among query relevance, content salience and information diversity, the approach first makes use of both the content feature and the relationship feature to select a number of sentences via the cotraining based semi-supervised learning, which can identify the...
متن کاملCascaded Attention based Unsupervised Information Distillation for Compressive Summarization
When people recall and digest what they have read for writing summaries, the important content is more likely to attract their attention. Inspired by this observation, we propose a cascaded attention based unsupervised model to estimate the salience information from the text for compressive multi-document summarization. The attention weights are learned automatically by an unsupervised data rec...
متن کاملReader-Aware Multi-Document Summarization via Sparse Coding
We propose a new MDS paradigm called readeraware multi-document summarization (RA-MDS). Specifically, a set of reader comments associated with the news reports are also collected. The generated summaries from the reports for the event should be salient according to not only the reports but also the reader comments. To tackle this RAMDS problem, we propose a sparse-coding-based method that is ab...
متن کامل